Transforming an embodied conversational agent into an efficient talking head: from keyframe-based animation to multimodal concatenation synthesis

نویسندگان

  • Guillaume Gibert
  • Kirk N. Olsen
  • Yvonne Leung
  • Catherine J. Stevens
چکیده

BACKGROUND Virtual humans have become part of our everyday life (movies, internet, and computer games). Even though they are becoming more and more realistic, their speech capabilities are, most of the time, limited and not coherent and/or not synchronous with the corresponding acoustic signal. METHODS We describe a method to convert a virtual human avatar (animated through key frames and interpolation) into a more naturalistic talking head. In fact, speech articulation cannot be accurately replicated using interpolation between key frames and talking heads with good speech capabilities are derived from real speech production data. Motion capture data are commonly used to provide accurate facial motion for visible speech articulators (jaw and lips) synchronous with acoustics. To access tongue trajectories (partially occluded speech articulator), electromagnetic articulography (EMA) is often used. We recorded a large database of phonetically-balanced English sentences with synchronous EMA, motion capture data, and acoustics. An articulatory model was computed on this database to recover missing data and to provide 'normalized' animation (i.e., articulatory) parameters. In addition, semi-automatic segmentation was performed on the acoustic stream. A dictionary of multimodal Australian English diphones was created. It is composed of the variation of the articulatory parameters between all the successive stable allophones. RESULTS The avatar's facial key frames were converted into articulatory parameters steering its speech articulators (jaw, lips and tongue). The speech production database was used to drive the Embodied Conversational Agent (ECA) and to enhance its speech capabilities. A Text-To-Auditory Visual Speech synthesizer was created based on the MaryTTS software and on the diphone dictionary derived from the speech production database. CONCLUSIONS We describe a method to transform an ECA with generic tongue model and animation by key frames into a talking head that displays naturalistic tongue, jaw and lip motions. Thanks to a multimodal speech production database, a Text-To-Auditory Visual Speech synthesizer drives the ECA's facial movements enhancing its speech capabilities.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

On Combining the Facial Movements of a Talking Head

We present work on Obie, an embodied conversational agent framework. An embodied conversational agent, or talking head, consists of three main components. The graphical part consists of a face model and a facial muscle model. Besides the graphical part, we have implemented an emotion model and a mapping from emotions to facial expressions. The animation part of the framework focuses on the comb...

متن کامل

Head X: Customizable Audiovisual Synthesis for a Multi-purpose Virtual Head

The development of embodied conversational agents (ECAs) involves a wide range of cutting-edge technologies extending from multimodal perception to reasoning to synthesis. While each is important to a successful outcome, it is the synthesis that has the most immediate impact on the observer. The specific appearance and voice of an embodied conversational agent (ECA) can be decisive factors in m...

متن کامل

A 3d audio-visual animated agent for expressive conversational question answering

This paper reports on the ACQA (Animated agent for Conversational Question Answering) project conducted at LIMSI. The aim is to design an expressive animated conversational agent (ACA) for conducting research along two main lines: 1/ perceptual experiments (eg perception of expressivity and 3D movements in both audio and visual channels): 2/ design of human-computer interfaces requiring head mo...

متن کامل

A Controller-Based Animation System for Synchronizing and Realizing Human-Like Conversational Behaviors

The Embodied Conversational Agents (ECAs) are an application of virtual characters that is subject of considerable ongoing research. An essential prerequisite for creating believable ECAs is the ability to describe and visually realize multimodal conversational behaviors. The recently developed Behavior Markup Language (BML) seeks to address this requirement by granting a means to specify physi...

متن کامل

SmartBody: behavior realization for embodied conversational agents

Researchers demand much from their embodied conversational agents (ECAs), requiring them to be both life-like, as well as responsive to events in an interactive setting. We find that a flexible combination of animation approaches may be needed to satisfy these needs. In this paper we present SmartBody, an open source modular framework for animating ECAs in real time, based on the notion of hier...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره 1  شماره 

صفحات  -

تاریخ انتشار 2015